Statistical and Computational Theory and Methodology for Big Data Analysis

نویسندگان

  • Ming-Hui Chen
  • Radu Craiu
  • Faming Liang
  • Chuanhai Liu
چکیده

The integration of computer technology into science and daily life has enabled the collection of massive volumes of data, such as high-throughput biological assay data, climate data, website transaction logs, and credit card records. However, such big data sets cannot be practically analyzed on a single commodity computer because their sizes are too large to fit in memory or it is too time consuming to process when the current statistical methods are used. To circumvent this obstacle, one may have to resort to parallel and distributed architectures, with multicore and cloud computing platforms providing access to hundreds or thousands of processors. While the parallel and distributed architectures present new capabilities for storage and manipulation of data, from an inferential point of view, it is unclear how the current statistical methodology can be transported to the paradigm of big data. Also, with growing size typically comes a growing complexity of data structures, of the patterns in the data, and of the models needed to account for the patterns. Big data has put a great challenge on the current statistical methodology.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A statistical analysis framework for bus reliability evaluation based on AVL data: A case study of Qazvin, Iran

Reliability is a fundamental factor in the operation of bus transportation systems for the reason that it signifies a straight indicator of the quality of service and operator’s costs. Todays, the application of GPS technology in bus systems provides big data availability, though it brings the difficulties of data preprocessing in a methodical approach. In this study, the principal component an...

متن کامل

Discrete Distribution Clustering in Big Data and a Method for Storm Prediction Leveraging Large

Big data brings new challenges and opportunities in many scientific areas today. Characterized by the high volume, velocity, and variety (3Vs) model, big data is valuable in many knowledge discovery applications, whereas requires new methodologies and technologies to manage and make use of the data. In this dissertation, a fundamental methodology and an emerging application of big data are pres...

متن کامل

A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection

Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....

متن کامل

Introducing a positive thinking model in auditing based on grounded theory

The purpose of this study is Provide a positive thinking model in auditing based on grounded theory. The statistical population of the study included auditing experts and the samples included 14 experts using snowball sampling method. Data collection tools included semi-structured interviews and data analysis method was content analysis using a three-step encoding using Maxqda software. For t...

متن کامل

Unbiased Bayes for Big Data: Paths of Partial Posteriors

A key quantity of interest in Bayesian inference are expectations of functions with respect to a posterior distribution. Markov Chain Monte Carlo is a fundamental tool to consistently compute these expectations via averaging samples drawn from an approximate posterior. However, its feasibility is being challenged in the era of so called Big Data as all data needs to be processed in every iterat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014